Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28.455
Filtrar
1.
Ann Plast Surg ; 92(4S Suppl 2): S101-S104, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38556656

RESUMEN

BACKGROUND: Pharyngeal flap (PF) surgery is effective at improving velopharyngeal sufficiency, but historical literature shows a concerning prevalence rate of obstructive sleep apnea (OSA), reported as high as 20%. Our institution has developed a protocol to minimize risk of postoperative obstructive complications and increase safety of PF surgery. We hypothesize that (1) preoperative staged removal of significant adenotonsillar tissue along with (2) multiview videofluoroscopy to guide patient-specific surgical approach via appropriately sized PFs can result in excellent speech outcomes while limiting occurrence of OSA. METHODS: This was a retrospective chart review of all patients with velopharyngeal insufficiency (VPI) (aged 2-20 years) seen at the University of Rochester from 2015 to 2022 undergoing PF surgery to correct VPI. Nasopharyngoscopy was used for surgical planning and airway evaluation. Patients with tonsillar and adenoid hypertrophy underwent staged adenotonsillectomy at least 2 months before PF. Multiview videofluoroscopy was used to identify anatomic causes of VPI and to determine PF width. Patients underwent polysomnography and speech evaluation before and at least 6 months after PF surgery. RESULTS: Forty-one children aged 8.5 ± 4.1 years (range, 4 to 18 years) who underwent posterior PF surgery for VPI were identified. This included 10 patients with 22q11.2 deletion and 4 patients with Pierre Robin sequence. Thirty-nine patients had both pre- and postoperative speech data and underwent both a pre- and postoperative sleep study. Polysomnography showed no significant difference in obstructive apnea-hypopnea index after posterior PF surgery (obstructive apnea-hypopnea index preop, 1.3 ± 1.2 events per hour; postop, 1.7 ± 2.1 events per hour; P = 0.111). Significant improvements in speech outcome were seen in patients who underwent PF (modified Pittsburgh score preop, 11.52 ± 1.37; postop, 1.09 ± 2.35; P < 0.05). CONCLUSIONS: Use of preoperative staged adenotonsillectomy as well as patient-specific PF dimensions results in effective resolution of VPI and a low risk of OSA.


Asunto(s)
Apnea Obstructiva del Sueño , Insuficiencia Velofaríngea , Niño , Humanos , Habla , Estudios Retrospectivos , Vías Clínicas , Faringe/cirugía , Insuficiencia Velofaríngea/cirugía , Insuficiencia Velofaríngea/complicaciones , Apnea Obstructiva del Sueño/etiología , Complicaciones Posoperatorias/epidemiología , Resultado del Tratamiento
2.
Sci Rep ; 14(1): 7697, 2024 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-38565624

RESUMEN

The rapid increase in biomedical publications necessitates efficient systems to automatically handle Biomedical Named Entity Recognition (BioNER) tasks in unstructured text. However, accurately detecting biomedical entities is quite challenging due to the complexity of their names and the frequent use of abbreviations. In this paper, we propose BioBBC, a deep learning (DL) model that utilizes multi-feature embeddings and is constructed based on the BERT-BiLSTM-CRF to address the BioNER task. BioBBC consists of three main layers; an embedding layer, a Long Short-Term Memory (Bi-LSTM) layer, and a Conditional Random Fields (CRF) layer. BioBBC takes sentences from the biomedical domain as input and identifies the biomedical entities mentioned within the text. The embedding layer generates enriched contextual representation vectors of the input by learning the text through four types of embeddings: part-of-speech tags (POS tags) embedding, char-level embedding, BERT embedding, and data-specific embedding. The BiLSTM layer produces additional syntactic and semantic feature representations. Finally, the CRF layer identifies the best possible tag sequence for the input sentence. Our model is well-constructed and well-optimized for detecting different types of biomedical entities. Based on experimental results, our model outperformed state-of-the-art (SOTA) models with significant improvements based on six benchmark BioNER datasets.


Asunto(s)
Lenguaje , Semántica , Procesamiento de Lenguaje Natural , Benchmarking , Habla
3.
JASA Express Lett ; 4(4)2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38573045

RESUMEN

The present study examined English vowel recognition in multi-talker babbles (MTBs) in 20 normal-hearing, native-English-speaking adult listeners. Twelve vowels, embedded in the h-V-d structure, were presented in MTBs consisting of 1, 2, 4, 6, 8, 10, and 12 talkers (numbers of talkers [N]) and a speech-shaped noise at signal-to-noise ratios of -12, -6, and 0 dB. Results showed that vowel recognition performance was a non-monotonic function of N when signal-to-noise ratios were less favorable. The masking effects of MTBs on vowel recognition were most similar to consonant recognition but less so to word and sentence recognition reported in previous studies.


Asunto(s)
Lenguaje , Habla , Adulto , Humanos , Reconocimiento en Psicología , Relación Señal-Ruido
4.
Sci Rep ; 14(1): 8181, 2024 04 08.
Artículo en Inglés | MEDLINE | ID: mdl-38589483

RESUMEN

Temporal envelope modulations (TEMs) are one of the most important features that cochlear implant (CI) users rely on to understand speech. Electroencephalographic assessment of TEM encoding could help clinicians to predict speech recognition more objectively, even in patients unable to provide active feedback. The acoustic change complex (ACC) and the auditory steady-state response (ASSR) evoked by low-frequency amplitude-modulated pulse trains can be used to assess TEM encoding with electrical stimulation of individual CI electrodes. In this study, we focused on amplitude modulation detection (AMD) and amplitude modulation frequency discrimination (AMFD) with stimulation of a basal versus an apical electrode. In twelve adult CI users, we (a) assessed behavioral AMFD thresholds and (b) recorded cortical auditory evoked potentials (CAEPs), AMD-ACC, AMFD-ACC, and ASSR in a combined 3-stimulus paradigm. We found that the electrophysiological responses were significantly higher for apical than for basal stimulation. Peak amplitudes of AMFD-ACC were small and (therefore) did not correlate with speech-in-noise recognition. We found significant correlations between speech-in-noise recognition and (a) behavioral AMFD thresholds and (b) AMD-ACC peak amplitudes. AMD and AMFD hold potential to develop a clinically applicable tool for assessing TEM encoding to predict speech recognition in CI users.


Asunto(s)
Implantación Coclear , Implantes Cocleares , Percepción del Habla , Adulto , Humanos , Psicoacústica , Percepción del Habla/fisiología , Habla , Estimulación Acústica , Potenciales Evocados Auditivos/fisiología
5.
J Speech Lang Hear Res ; 67(4): 1020-1041, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38557114

RESUMEN

PURPOSE: The purpose of this study was to identify commonalities and differences between content components in stuttering treatment programs for preschool-age children. METHOD: In this document analysis, a thematic analysis of the content was conducted of handbooks and manuals describing Early Childhood Stuttering Therapy, the Lidcombe Program, Mini-KIDS, Palin Parent-Child Interaction Therapy, RESTART Demands and Capacities Model Method, and the Westmead Program. First, a theoretical framework defining a content component in treatment was developed. Second, we coded and categorized the data following the procedure of reflexive thematic analysis. In addition, the first authors of the treatment documents have reviewed the findings in this study, and their feedback has been analyzed and taken into consideration. RESULTS: Sixty-one content components within the seven themes-interaction, coping, reactions, everyday life, information, language, and speech-were identified across the treatment programs. The content component SLP providing information about the child's stuttering was identified across all treatment programs. All programs are multithematic, and no treatment program has a single focus on speech, language, or parent-child interaction. A comparison of the programs with equal treatment goals highlighted more commonalities in content components across the programs. The differences between the treatment programs were evident in both the number of content components that varied from seven to 39 and the content included in each treatment program. CONCLUSIONS: Only one common content component was identified across programs, and the number and types of components vary widely. The role that the common content component plays in treatment effects is discussed, alongside implications for research and clinical practice. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.25457929.


Asunto(s)
Tartamudeo , Humanos , Preescolar , Tartamudeo/terapia , Logopedia/métodos , Análisis de Documentos , Resultado del Tratamiento , Habla
6.
J Speech Lang Hear Res ; 67(4): 1143-1164, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38568053

RESUMEN

PURPOSE: Connected speech analysis has been effectively utilized for the diagnosis and disease monitoring of individuals with Alzheimer's disease (AD). Existing research has been conducted mostly in monolingual English speakers with a noticeable lack of evidence from bilinguals and non-English speakers, particularly in non-European languages. Using a case study approach, we characterized connected speech profiles of two Bengali-English bilingual speakers with AD to determine the universal features of language impairments in both languages, identify language-specific differences between the languages, and explore language impairment characteristics of the participants with AD in relation to their bilingual language experience. METHOD: Participants included two Bengali-English bilingual speakers with AD and a group of age-, gender-, education-, and language-matched neurologically healthy controls. Connected speech samples were collected in first language (L1; Bengali) and second language (L2; English) using a novel storytelling task (i.e., Frog, Where Are You?). These samples were analyzed using an augmented quantitative production analysis and correct information unit analyses for productivity, fluency, syntactic and morphosyntactic features, and lexical and semantic characteristics. RESULTS: Irrespective of the language, AD impacted speech productivity (speech rate and fluency) and semantic characteristics in both languages. Unique language-specific differences were noted on syntactic measures (reduced sentence length in Bengali), lexical distribution (fewer pronouns and absence of reduplication in Bengali), and inflectional properties (no difficulties with noun or verb inflections in Bengali). Among the two participants with AD, the individual who showed lower proficiency and usage in L2 (English) demonstrated reduced syntactic complexity and morphosyntactic richness in English. CONCLUSIONS: Evidence from these case studies suggests that language impairment features in AD are not universal across languages, particularly in comparison to impairments typically associated with language breakdowns in English. This study underscores the importance of establishing connected speech profiles in AD for non-English-speaking populations, especially for structurally different languages. This would in turn lead to the development of language-specific markers that can facilitate early detection of language deterioration and aid in improving diagnosis of AD in individuals belonging to underserved linguistically diverse populations. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.25412458.


Asunto(s)
Enfermedad de Alzheimer , Trastornos del Desarrollo del Lenguaje , Multilingüismo , Humanos , Habla , Lenguaje
7.
Lang Speech Hear Serv Sch ; 55(2): 389-393, 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38563740

RESUMEN

PURPOSE: This prologue introduces the forum "Pediatric Feeding Disorder and the School-Based SLP: An Evidence-Based Update for Clinical Practice" and informs the reader of the scope of articles presented. METHOD: The guest prologue author provides a brief history of pediatric feeding and swallowing services in the public-school setting, including previous forums on swallowing and feeding services in the schools (Logemann & O'Toole, 2000; McNeilly & Sheppard, 2008). The concepts that have been learned since the 2008 forum are shared. The contributing authors in the forum are introduced, and a summary is provided for each of the articles. CONCLUSIONS: The articles provide evidence-based information on topics that are uniquely of interest to school-based speech-language pathologists managing pediatric feeding and swallowing in their districts. The topics shared in this forum range from relevant information on anatomy, physiology, developmental milestones, and differential diagnosis to therapeutic practice when identifying and treating pediatric feeding and swallowing in the school setting. The forum also includes focused articles on the necessity of collaboration with families during the treatment process, current information on legal parameters dealing with school-based pediatric feeding disorder services, and a framework for assessment and treating pediatric feeding disorder in the school setting.


Asunto(s)
Trastornos de Alimentación y de la Ingestión de Alimentos , Patología del Habla y Lenguaje , Humanos , Niño , Patólogos , Habla , Lenguaje , Aprendizaje , Trastornos de Alimentación y de la Ingestión de Alimentos/diagnóstico , Trastornos de Alimentación y de la Ingestión de Alimentos/terapia
9.
J Acoust Soc Am ; 155(4): 2698-2706, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38639561

RESUMEN

The notion of the "perceptual center" or the "P-center" has been put forward to account for the repeated finding that acoustic and perceived syllable onsets do not necessarily coincide, at least in the perception of simple monosyllables or disyllables. The magnitude of the discrepancy between acoustics and perception-the location of the P-center in the speech signal- has proven difficult to estimate, though acoustic models of the effect do exist. The present study asks if the P-center effect can be documented in natural connected speech of English and Japanese and examines if an acoustic model that defines the P-center as the moment of the fastest energy change in a syllabic amplitude envelope adequately reflects the P-center in the two languages. A sensorimotor synchronization paradigm was deployed to address the research questions. The results provide evidence for the existence of the P-center effect in speech of both languages while the acoustic P-center model is found to be less applicable to Japanese. Sensorimotor synchronization patterns further suggest that the P-center may reflect perceptual anticipation of a vowel onset.


Asunto(s)
Acústica del Lenguaje , Percepción del Habla , Humanos , Fonética , Habla , Lenguaje
10.
Int J Yoga Therap ; 34(2024)2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38640400

RESUMEN

A previous study discovered that two speakers with moderate apraxia of speech increased their sequential motion rates after unilateral forced-nostril breathing (UFNB) practiced as an adjunct to speech-language therapy in an AB repeated-measures design. The current study sought to: (1) delineate possible UFNB plus practice effects from practice effects alone in motor speech skills; (2) examine the relationships between UFNB integrity, participant-reported stress levels, and motor speech performance; and (3) sample a participant-led UFNB training schedule to contribute to the literature's growing understanding of UFNB dosage. A single-subject (n-of-1 trial), ABAB reversal design was used across four motor speech behaviors. A 60-year-old female with chronic, severe apraxia of speech participated. The researchers developed a breathing app to assess UFNB practice integrity and administer the Simple Aphasia Stress Scale after each UFNB session. The participant improved from overall severe to moderate apraxia of speech on the Apraxia Battery for Adults. Visual inspection of graphs confirmed robust motor speech practice effects for all variables. Articulatory-kinematic variables demonstrated sensitivity to the UFNB-plus-practice condition and correlated to stress scale scores but not UFNB integrity scores. The participant achieved 20-minute UFNB sessions 4 times per week. Removal of UFNB during A2 (UFNB withdrawal) and after a 10-day break during B2 (UFNB full dosage) revealed UFNB practice effects on stress scale scores. UFNB with motor speech practice may benefit articulatory-kinematic skills compared to motor speech practice alone. Regular, cumulative UFNB practice appeared to lower self-perceived stress levels. These findings, along with prior work, provide a foundation to further explore yoga breathing and its use with speakers who have apraxia of speech.


Asunto(s)
Afasia , Apraxias , Yoga , Adulto , Femenino , Humanos , Persona de Mediana Edad , Habla , Apraxias/terapia , Respiración , Afasia/terapia
11.
PLoS One ; 19(4): e0301514, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38564597

RESUMEN

Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning.


Asunto(s)
Corteza Auditiva , Percepción del Habla , Humanos , Habla/fisiología , Percepción del Habla/fisiología , Acústica , Movimiento , Fonética , Acústica del Lenguaje
12.
Int J Pediatr Otorhinolaryngol ; 179: 111940, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38588634

RESUMEN

OBJECTIVES: Velocardiofacial syndrome, a prevalent microdeletion syndrome occurring in 1 in 2000-4000 live births, is marked by speech and language disorders, notably velopharyngeal dysfunction. This study investigates speech outcomes, nasometric and videofluoroscopic results before and after primary repair of cleft palate using the Sommerlad intravelar veloplasty (SIVV) technique within the Isfahan cleft care team for patients with velocardiofacial syndrome. METHODS: Employing a quasi-experimental design, 19 participants with velocardiofacial syndrome, who underwent primary cleft palate repair by the Isfahan cleft care team, were included through convenience sampling. Perceptual and instrumental outcomes were assessed pre-and post-operatively. Statistical analysis encompassed paired t-tests and the non-parametric Wilcoxon signed-rank test (p < 0.05). RESULTS: The study identified no statistically significant differences between pre-and post-surgical speech outcome parameters and nasalance scores. Nonetheless, a significant distinction emerged in the velopharyngeal closure ratio based on fluoroscopic evaluation (p = 0.038). CONCLUSION: The efficacy of the SIVV technique in treating velopharyngeal dysfunction in velocardiofacial syndrome patients is inconclusive, demanding further research. Post-surgical speech outcomes are influenced by surgical technique, hypotonia, apraxia of speech, and surgery timing. Notably, an elevated velopharyngeal valve closure ratio, though anatomically indicative, does not exclusively predict surgical success.


Asunto(s)
Fisura del Paladar , Síndrome de DiGeorge , Procedimientos de Cirugía Plástica , Insuficiencia Velofaríngea , Humanos , Fisura del Paladar/complicaciones , Fisura del Paladar/cirugía , Síndrome de DiGeorge/complicaciones , Síndrome de DiGeorge/cirugía , Insuficiencia Velofaríngea/cirugía , Insuficiencia Velofaríngea/complicaciones , Resultado del Tratamiento , Estudios Retrospectivos , Habla , Paladar Blando/cirugía
13.
Cereb Cortex ; 34(4)2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38566511

RESUMEN

This study investigates neural processes in infant speech processing, with a focus on left frontal brain regions and hemispheric lateralization in Mandarin-speaking infants' acquisition of native tonal categories. We tested 2- to 6-month-old Mandarin learners to explore age-related improvements in tone discrimination, the role of inferior frontal regions in abstract speech category representation, and left hemisphere lateralization during tone processing. Using a block design, we presented four Mandarin tones via [ta] and measured oxygenated hemoglobin concentration with functional near-infrared spectroscopy. Results showed age-related improvements in tone discrimination, greater involvement of frontal regions in older infants indicating abstract tonal representation development and increased bilateral activation mirroring native adult Mandarin speakers. These findings contribute to our broader understanding of the relationship between native speech acquisition and infant brain development during the critical period of early language learning.


Asunto(s)
Percepción del Habla , Habla , Adulto , Lactante , Humanos , Anciano , Percepción del Habla/fisiología , Percepción de la Altura Tonal/fisiología , Desarrollo del Lenguaje , Encéfalo/diagnóstico por imagen , Encéfalo/fisiología
14.
JASA Express Lett ; 4(4)2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38568027

RESUMEN

This study investigates speech production under various room acoustic conditions in virtual environments, by comparing vocal behavior and the subjective experience of speaking in four real rooms and their audio-visual virtual replicas. Sex differences were explored. Males and females (N = 13) adjusted their voice levels similarly to room acoustic changes in the real rooms, but only males did so in the virtual rooms. Females, however, rated the visual virtual environment as more realistic compared to males. This suggests a discrepancy between sexes regarding the experience of realism in a virtual environment and changes in objective behavioral measures such as voice level.


Asunto(s)
Caracteres Sexuales , Habla , Femenino , Masculino , Humanos , Acústica
15.
Elife ; 122024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38577982

RESUMEN

A core aspect of human speech comprehension is the ability to incrementally integrate consecutive words into a structured and coherent interpretation, aligning with the speaker's intended meaning. This rapid process is subject to multidimensional probabilistic constraints, including both linguistic knowledge and non-linguistic information within specific contexts, and it is their interpretative coherence that drives successful comprehension. To study the neural substrates of this process, we extract word-by-word measures of sentential structure from BERT, a deep language model, which effectively approximates the coherent outcomes of the dynamic interplay among various types of constraints. Using representational similarity analysis, we tested BERT parse depths and relevant corpus-based measures against the spatiotemporally resolved brain activity recorded by electro-/magnetoencephalography when participants were listening to the same sentences. Our results provide a detailed picture of the neurobiological processes involved in the incremental construction of structured interpretations. These findings show when and where coherent interpretations emerge through the evaluation and integration of multifaceted constraints in the brain, which engages bilateral brain regions extending beyond the classical fronto-temporal language system. Furthermore, this study provides empirical evidence supporting the use of artificial neural networks as computational models for revealing the neural dynamics underpinning complex cognitive processes in the brain.


Asunto(s)
Comprensión , Habla , Humanos , Encéfalo , Magnetoencefalografía/métodos , Lenguaje
16.
Sensors (Basel) ; 24(7)2024 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-38610256

RESUMEN

The ongoing biodiversity crisis, driven by factors such as land-use change and global warming, emphasizes the need for effective ecological monitoring methods. Acoustic monitoring of biodiversity has emerged as an important monitoring tool. Detecting human voices in soundscape monitoring projects is useful both for analyzing human disturbance and for privacy filtering. Despite significant strides in deep learning in recent years, the deployment of large neural networks on compact devices poses challenges due to memory and latency constraints. Our approach focuses on leveraging knowledge distillation techniques to design efficient, lightweight student models for speech detection in bioacoustics. In particular, we employed the MobileNetV3-Small-Pi model to create compact yet effective student architectures to compare against the larger EcoVAD teacher model, a well-regarded voice detection architecture in eco-acoustic monitoring. The comparative analysis included examining various configurations of the MobileNetV3-Small-Pi-derived student models to identify optimal performance. Additionally, a thorough evaluation of different distillation techniques was conducted to ascertain the most effective method for model selection. Our findings revealed that the distilled models exhibited comparable performance to the EcoVAD teacher model, indicating a promising approach to overcoming computational barriers for real-time ecological monitoring.


Asunto(s)
Habla , Voz , Humanos , Acústica , Biodiversidad , Conocimiento
17.
Trends Hear ; 28: 23312165241245240, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38613337

RESUMEN

Listening to speech in noise can require substantial mental effort, even among younger normal-hearing adults. The task-evoked pupil response (TEPR) has been shown to track the increased effort exerted to recognize words or sentences in increasing noise. However, few studies have examined the trajectory of listening effort across longer, more natural, stretches of speech, or the extent to which expectations about upcoming listening difficulty modulate the TEPR. Seventeen younger normal-hearing adults listened to 60-s-long audiobook passages, repeated three times in a row, at two different signal-to-noise ratios (SNRs) while pupil size was recorded. There was a significant interaction between SNR, repetition, and baseline pupil size on sustained listening effort. At lower baseline pupil sizes, potentially reflecting lower attention mobilization, TEPRs were more sustained in the harder SNR condition, particularly when attention mobilization remained low by the third presentation. At intermediate baseline pupil sizes, differences between conditions were largely absent, suggesting these listeners had optimally mobilized their attention for both SNRs. Lastly, at higher baseline pupil sizes, potentially reflecting overmobilization of attention, the effect of SNR was initially reversed for the second and third presentations: participants initially appeared to disengage in the harder SNR condition, resulting in reduced TEPRs that recovered in the second half of the story. Together, these findings suggest that the unfolding of listening effort over time depends critically on the extent to which individuals have successfully mobilized their attention in anticipation of difficult listening conditions.


Asunto(s)
Esfuerzo de Escucha , Pupila , Adulto , Humanos , Relación Señal-Ruido , Habla
18.
IEEE J Transl Eng Health Med ; 12: 382-389, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38606392

RESUMEN

Acoustic features extracted from speech can help with the diagnosis of neurological diseases and monitoring of symptoms over time. Temporal segmentation of audio signals into individual words is an important pre-processing step needed prior to extracting acoustic features. Machine learning techniques could be used to automate speech segmentation via automatic speech recognition (ASR) and sequence to sequence alignment. While state-of-the-art ASR models achieve good performance on healthy speech, their performance significantly drops when evaluated on dysarthric speech. Fine-tuning ASR models on impaired speech can improve performance in dysarthric individuals, but it requires representative clinical data, which is difficult to collect and may raise privacy concerns. This study explores the feasibility of using two augmentation methods to increase ASR performance on dysarthric speech: 1) healthy individuals varying their speaking rate and loudness (as is often used in assessments of pathological speech); 2) synthetic speech with variations in speaking rate and accent (to ensure more diverse vocal representations and fairness). Experimental evaluations showed that fine-tuning a pre-trained ASR model with data from these two sources outperformed a model fine-tuned only on real clinical data and matched the performance of a model fine-tuned on the combination of real clinical data and synthetic speech. When evaluated on held-out acoustic data from 24 individuals with various neurological diseases, the best performing model achieved an average word error rate of 5.7% and a mean correct count accuracy of 94.4%. In segmenting the data into individual words, a mean intersection-over-union of 89.2% was obtained against manual parsing (ground truth). It can be concluded that emulated and synthetic augmentations can significantly reduce the need for real clinical data of dysarthric speech when fine-tuning ASR models and, in turn, for speech segmentation.


Asunto(s)
Percepción del Habla , Habla , Humanos , Software de Reconocimiento del Habla , Disartria/diagnóstico , Trastornos del Habla
19.
Anim Cogn ; 27(1): 34, 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38625429

RESUMEN

Humans have an impressive ability to comprehend signal-degraded speech; however, the extent to which comprehension of degraded speech relies on human-specific features of speech perception vs. more general cognitive processes is unknown. Since dogs live alongside humans and regularly hear speech, they can be used as a model to differentiate between these possibilities. One often-studied type of degraded speech is noise-vocoded speech (sometimes thought of as cochlear-implant-simulation speech). Noise-vocoded speech is made by dividing the speech signal into frequency bands (channels), identifying the amplitude envelope of each individual band, and then using these envelopes to modulate bands of noise centered over the same frequency regions - the result is a signal with preserved temporal cues, but vastly reduced frequency information. Here, we tested dogs' recognition of familiar words produced in 16-channel vocoded speech. In the first study, dogs heard their names and unfamiliar dogs' names (foils) in vocoded speech as well as natural speech. In the second study, dogs heard 16-channel vocoded speech only. Dogs listened longer to their vocoded name than vocoded foils in both experiments, showing that they can comprehend a 16-channel vocoded version of their name without prior exposure to vocoded speech, and without immediate exposure to the natural-speech version of their name. Dogs' name recognition in the second study was mediated by the number of phonemes in the dogs' name, suggesting that phonological context plays a role in degraded speech comprehension.


Asunto(s)
Percepción del Habla , Habla , Humanos , Animales , Perros , Señales (Psicología) , Audición , Lingüística
20.
PLoS One ; 19(4): e0300382, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38625991

RESUMEN

The neural processes underpinning cognition and language development in infancy are of great interest. We investigated EEG power and coherence in infancy, as a reflection of underlying cortical function of single brain region and cross-region connectivity, and their relations to cognition and early precursors of speech and language development. EEG recordings were longitudinally collected from 21 infants with typical development between approximately 1 and 7 months. We investigated relative band power at 3-6Hz and 6-9Hz and EEG coherence of these frequency ranges at 25 electrode pairs that cover key brain regions. A correlation analysis was performed to assess the relationship between EEG measurements across frequency bands and brain regions and raw Bayley cognitive and language developmental scores. In the first months of life, relative band power is not correlated with cognitive and language scales. However, 3-6Hz coherence is negatively correlated with receptive language scores between frontoparietal regions, and 6-9Hz coherence is negatively correlated with expressive language scores between frontoparietal regions. The results from this preliminary study contribute to the existing literature on the relationship between electrophysiological development, cognition, and early speech precursors in this age group. Future work should create norm references of early development in these domains that can be compared with infants at risk for neurodevelopmental disabilities.


Asunto(s)
Electroencefalografía , Habla , Lactante , Humanos , Electroencefalografía/métodos , Desarrollo del Lenguaje , Cognición/fisiología , Encéfalo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...